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The general time-reversible (GTR) model (Tavare, 1986) has been the workhorse of 
molecular phylogenetics for the last decade. GTR sits at the top of the ModelTest 
hierarchy of models (Posada & Crandall, 1998) and, usually with the addition of 
invariant sites and a gamma distribution of rates across sites, is currently by far 
the most commonly selected model for phylogenetic inference (see Table Q. 
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GMM 


1 


1 


1 


18 


41 


90 



Table 1: Popularity of phylogenetic models of DNA substitution as measured 
by number of hits in Google Scholar on the search terms shown in bold (search 
conducted 23/08/2011). Note that there are a small number of false positives. 



JC69 is the Jukes-Cantor model (Jukes & Cantor, 1969), K80 is the Kimura two 



parameter model (Kimura, 1981), F81 is the Felsenstein model ( Felsenstein 1981), 



model (Barry & Hartigan, 1987). 



HKY was introduced in Hasegawa et al. (1985), and GMM is the general Markov 
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However, a recent publication (Sumner et al, 2011) shows that GTR, along 



with several other commonly used models, has an undesirable mathematical prop- 
erty that may be a cause of concern for the thoughtful phylogeneticist. In math- 
ematical terms, the problem is simple: matrix multiplication of two GTR substi- 
tution matrices does not return another GTR matrix. It is the purpose of this 
article to give examples that demonstrate why this deficit may pose a problem for 
phylogenetic analysis. 

Consider a single molecular sequence where each site evolves under the same, 
albeit heterogeneous, Markov process (we assume that sites evolve independently 
and under identical conditions). For time t = to £1, the sequence evolves under 
a time-homogeneous Markov process so that substitutions are governed by a rate- 
matrix Qi whose ijih entry is the rate at which state i changes to state j, so 
that the corresponding probability substitution matrix is given by Mi = e^ 1 * 1 . 
Then a breakpoint occurs, and over time £2 the sequence again evolves under a 
time-homogeneous Markov process but with a different rate-matrix Q2 governing 
the substitution rates, so that the corresponding substitution matrix for this time 
period is M 2 = e^ 2 * 2 . As a consequence of the Markov assumption for the overall 
process, the substitution matrix that describes the probability of substitutions 
between time t — and t — 1\ + 1% can be expressed as the matrix product M = 
M\M.2. So far, so good, but then two questions naturally arise: 

i. Is there a single rate-matrix Q that can be used to describe the process 

from the time t = to t = t\ + ti as a homogeneous Markov process with 
e Q(ti+t 2 ) _ 

ii. If Qi and Q2 are in a particular model class, will Q necessarily belong to the 
same class? 



Sumner et al. (2011) show that the answer to the first question is "yes" (by con- 
firming that the general Markov model is closed under matrix multiplication), and 
that the answer to the second question is "sometimes" depending on the class of 
model considered. Crucially, for the case of the GTR model class they show that 
the answer to the second question is "no" . 

Why does this lack of closure under matrix multiplication for the GTR model 
matter for phylogenetics? In almost all standard phylogenetic studies a model is 
used that assumes a homogeneous Markov process generated the molecular data. 
This is implemented by taking a fixed rate-matrix to apply at all times and on all 
lineages of the evolutionary tree, and helps to maximize statistical power by re- 
ducing the number of free parameters present and thus keeping the corresponding 
estimation variance to a minimum. However, the thoughtful phylogeneticist does 
not necessarily believe that the truth of the matter is that the random process re- 
mained fixed through time and across the lineages of the evolutionary tree. Rather, 
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biological reality is likely to consist of some form of time-dependent and lineage- 
specific evolution governed by varying substitution rates. Under the assumption 
of a Markov process, this corresponds mathematically to the rate-matrix changing 
across the evolutionary tree. For example, it has long been known that base com- 



position is not fixed throughout evolutionary history ( 


Foster et al. 


1997 


Chang 


& Campbell 


2000 


Lockhart et al., 


1992 


Phillips et al. 


2004 


Tamo et al. 


2001), 



and that there is evidence that the Markov process varies across the evolutionary 



tree (Herbeck et al. 2005 Ota & Penny, 2003). 



So, it is apparent that the thoughtful phylogeneticist already thinks of her 
model as a statistical pragmatist's average of the true heterogeneous process. In 
this light, multiplicatively closed models are then simply those where it is possible 
to assume a homogeneous model and "average out" these effects in a mathemati- 
cally consistent way. 

Before discussing how the non-closure of GTR may affect phylogenetic esti- 
mation, we think it is worth reflecting on how GTR made its way to the top of 
the ModelTest hierarchy. Reversibility of a continuous-time Markov chain X(t) 
on some time interval t G [0,T] arises by considering the "time-reversed process" 
Y(t) = X(T — t), and demanding that the probability of observing X(T) =j given 
X(0) = i is identical to the probability of observing Y(T) = i given Y(0) = j. In 
this case, it is easy to show that the rate-matrix for the time-reversed process 
Y(t) is exactly the same as that for the original process, and the process is said 
to be "time-reversible" or simply "reversible" . Importantly, time-reversibility has 
long been held out as important for phylogenetic analysis because implementa- 
tion of Felsenstein's "pruning" algorithm for calculating likelihoods (Felsenstein 
1981) relies on the so-called "pulley-principle" and the process being reversible. 



However, recent authors have developed likelihood algorithms for non-reversible 



processes (Boussau & Gouy, 2006; Oscamou et al., 2008; Sumner & Charleston 



2010), and, besides, we argue that algorithmic convenience is not the only criterion 



that the thoughtful phylogeneticist may wish to take into account when choosing 
appropriate models. 

In case the concerns outlined above are seen to be somewhat philosophical, we 
now show that lack of multiplicative closure of a model can result in systematic 
biases for phylogenetic inference If we use a model where the true heterogeneous 
process cannot be exactly represented homogeneously under the same model, then 
it is possible that some error in estimation of substitution rates and/or specia- 
tion times will occur. A computer simulation was conducted to test how severe 
this misestimation could be for the simple situation of a single sequence evolv- 
ing under the GTR model with a breakpoint where the substitution process was 
allowed to change. As per the situation described above, GTR rate-matrices Qi 
and Q2 applied over times t\ and £2 respectively, giving the substitution matrices, 
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Mi = e^ 1 * 1 and M 2 = e^ 2 * 2 , and M = M\M 2 represents the transition matrix for 
the entire edge. We normalized the rate-matrices Q\ and Q 2 such that the sum 
of the (off-diagonal) rates was equal to 1, this allows edge lengths to be compared 
in a consistent way. We then sought to find a (similarly normalized) GTR class 
rate-matrix Q and time t, such that the "distance" from e^* to M was minimized. 
For this purpose we used d(M, N) = ~ [N]^) 2 ) 1 / 2 , as the distance be- 

tween any two substitution matrices M and N (where it should be noted that the 
constraint i ^ j in the summation is intentional; reflecting that we are effectively 
calculating the Euclidean distance between substition matrices as embedded in 
the ambient vector space of GMM substitution matrices). We also used the same 
distance to determine the distance between the two rate matrices Q\ and Q 2 , to 
give a measure of how "heterogeneous" the process was in each case. In each it- 
eration of the simulation we select two rate-matrices, Qi and Q 2 , from the GTR 
model class by sampling substitution rates randomly in such a way as to ensure 
a reasonably consistent spread between almost homogeneous and highly heteroge- 
neous. All calculations were performed in the statistical computing package R (Ir| 



Development Core Team, 2006) and its in-built optimisation routine "optim" 



To assist in comparison, we repeated the above simulation for the known closed 



model F81 (see Sumner et al. (2011) for details). F81 was chosen as, despite it 



being guaranteed that Q will also belong to the F81 model class (and hence our 
routine should find Q = Q), it is also not the case that Q is simply given by the 
weighted average of Qi and Q 2 (which should be compared to other closed models, 
such as JC69 and K80, where Q is given by the weighted mean). For this reason, 
F81 was deemed as a fair counterpoint to the GTR model. For all simulations, 
ti and t 2 were chosen to be 0.5, such that t± + 1 2 — 1, and i was compared to the 
actual time proportionally. Figure [T] shows the level of error in time estimation for 
both the closed F81 model and the unclosed GTR model. 

The results in Figure [T] suggest that the F81 model correctly estimates the 
time parameter to within the tolerance of the optimisation routine. However, for 
GTR we found up to a 19% under-estimation and 14% over-estimation of the time 
parameter, although this occurred in cases where the breakpoint introduced a large 
change in the rate-matrix. The magnitude of the error in time estimation increases 
as the distance between the two input rate-matrices increases. Another trend in 
the GTR data is that the points tend to underestimate more than overestimate 
as the input parameter distance increases. Figure [2] shows this by redisplaying 
the data from figure [T] using a boxplot for each increment of 0.1 in the distance 
between the two input rate-matrices. Finding a theoretical explanation for this 
tendancy for underestimation is currently an open problem. 

Having decided that lack of closure is a problem for phylogenetic models, the 
obvious question that arises is what are the closed models? These "good" models 
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Proportion of time error vs. change in rate matrix (GTR vs. F81) 





Distance between rate matrices Q1 and Q2 



Figure 1: Proportion of error in the time estimate for the homogeneous represen- 
tation of a truly heterogeneous situation increases significantly as the amount of 
change in rates increases at the breakpoint. BLUE points show results for F81 (a 
closed model), and RED points show results for GTR. 



are not yet fully known, and Sumner et al. (2011) discuss that in general this is 
quite a difficult and subtle mathematical problem that crosses the boundaries of 
Markov chain and Lie group theory. However, they do give a method that shows 
how to generate a complete list of models that have a certain invariance proper- 
ties under permutations of nucleotide states. In particular they give a complete 
hierarchy of closed models with 4-way nucleotide symmetry; that is, models that 
do not prefer any particular groupings of nucleotides, or to put it another way, 
models where for any relabelling (permutation) of the states it is possible to find 
a relabelling (permutation) of the substitution parameters such that the model 
is unchanged. The hierarchy they present contains GMM (the general Markov 
model), K3ST, F81 and JC, but includes a newly-discovered 6-parameter closed 
model dubbed "K3ST+F81", as it combines features of both these models. Work 
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Proportion of time error vs. distance between rate matrices (GTR) 




0.0 0.2 0.4 0.6 0.8 

'Distance' from Q1 to Q2 



Figure 2: Boxplots of the GTR model showing the time error increasing as the 
distance between the input parameters at a breakpoint increases. There appears 
to be a tendency towards underestimation of edge lengths. 



on cataloguing the "good" models for other symmetries is underway, with the 
method given in Sumner et al. (2011) being applied to generate the list of closed 
models that have the transition/transversion substition symmetry. For example, 
K2P is an example of a model with this symmetry that is known to be closed, 
whereas HKY has the same symmetry but is not closed. 

As a final example, we compared the performance of the closed model K3ST+F81 
to GTR, SYM (the GTR model with uniform base distribution), and HKY. These 
initial comparisons of the K3ST+F81 model to models which are not multiplica- 
tively closed show that it tends to give sometimes better and sometimes worse like- 
lihoods. For five different datasets a tree topology was generated using neighbour- 
joining and then, fixing this tree topology, we performed likelihood under each 
model, fitting rate substitutions and edge lengths. The likelihood values for the 
various models and datasets are presented in Table [2j With the caveat that it 
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Model 


Human 


Acorus 


Cormorants 


Yeast 


Fish 


F+K (5) 


-1557.76 


-451396.98 


-7014.46 


-710065.35 


-14534.49 


HKY (4) 


-0.72 


355.55 


14.38 


365.03 


8.50 


SYM (5) 


-0.45 


-1990.37 


-3.11 


-3695.09 


5.70 


GTR (8) 


4.14 


862.60 


40.76 


2409.14 


38.86 



Table 2: Log-likelihoods of various phylogenetic models on sample data sets. The 
number of free parameters for each model is given in parentheses (with the count 
reduced by one after overall scaling is taken into account). The values for the 
F81+K3ST model (shortened here to "F+K") are the actual log likelihoods, the 
other values are relative to this (with positive values indicating a larger likelihood). 



The data sets are: human, with 53 taxa, 202 sites, of mitochondrial genes (Ingman 



et al. 2000); Acorus, with 15 taxa, 89436 sites, from the chloroplast genomes 
(Goremykin et al., 2005); Cormorants, with 33 taxa, 1141 sites, from a mix of 



mitochondrial and nuclear genes (Holland et al., 2010); Yeast, with 8 taxa, 127026 



sites, from mainly nuclear genes (Rokas et al. , 2003); Fish, with 11 taxa, 2178 sites 



from nuclear genes (Zakon et al, 2006). Note that differences in log-likelihood 



greater than k, where k is the difference in the number of parameters, would be 
considered significant under the Akaike Information Criterion. 



is not necessarily statistically meaningful to compare likelihood values for these 
different models (especially given that the models are not nested), it can be seen 
from Table [2] that GTR consistently outperforms the other models. The differ- 
ences in log-likelihood cannot be solely explained by its increased number of free 
parameters. For the models with comparable number of parameters we found 
that sometimes the best fit comes from our closed model and sometimes from a 
non-closed model. What conclusions can be drawn from this study? Our overall 
point is that it is not prudent to only look for models that fit the data "best" 
in a likelihood sense, but to also look for models that can be given a satisfactory 
interpretation. We claim that, particularly in cases where a heterogeneous process 
is suspected, priority should be given to a closed model even when a non-closed 
model gives a better likelihood. 

To summarize, we argue that lack of closure constitutes a serious problem for 
the use of the GTR model in phylogenetics, as it means that taking an average 
of inhomogeneous processes - which is almost certainly the underlying biological 
reality in many circumstances - is impossible to do in an accurate way. Further 
research is required to find credible closed alternatives to GTR that offer similar 
ability to fit phylogenetic data. It will also be important to determine closure for 
cases where inhomogeneous models are used explicitly, for example in codon models 
(Yang, 1998), which are used to test for positive selection, and, for instance, the 
work of Hamady et al. (2006) which uses detectable shifts in inferred rate-matices 
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to infer if genes have been horizontally transferred. 
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